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40-Gb/s CLOCK AND DATA RECOVERY CIRCUIT IN 0.18 ^m TECHNOLOGY 



CROSS-REFERENCE TO RELATED APPLICATIONS 
This application claims priority under 35 U.S.C. §1 19(e) to co-pending 
5 and commonly-assigned Provisional Application Serial No. 60/445,722, entitled 
"A 40-GB/S CLOCK AND DATA RECOVERY CIRCUIT IN 0.18 MM CMOS 
TECHNOLOGY," filed on February 7, 2003, by Jri Lee and Behzad Razavi, 
attorney's docket number 30448.1 16-US-P1, which application is incorporated by 
reference herein. 
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BACKGROUND OF THE INVENTION 

1 . Field of the Invention. 

The present invention relates generally to 40-Gb/s data, and in particular, to a 40- 
Gb/s clock and data recovery circuit in 0.18 |im CMOS technology. 
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2. Description of the Related Art. 

(Note: This application references a number of different publications as indicated 
throughout the specification by reference numbers enclosed in brackets, e.g., [x]. A list 
of these different publications ordered according to these reference numbers can be found 
20 below in the section entitled "References." Each of these publications is incorporated by 
reference herein.) 

Clock and data recovery (CDR) circuits operating at tens of gigabits per second 
pose difficult challenges with respect to speed, jitter, signal distribution, and power 
consumption. Half-rate 40-Gb/s CDR circuits have been implemented in bipolar 

25 technology [1,2], but they require 5 V supplies and draw 1.6 to 5 watts of power. (The 
work in [1] uses an external oscillator and 90° phase shifter.) On the other hand, the 
recent integration of 10-Gb/s receivers in CMOS technology [3] encourages further 
research on CMOS solutions for higher speeds, especially if it enables low- voltage, low- 
power realization. The present invention comprises a design and experimental 

30 verification of a 40-Gb/s phase-locked CDR circuit fabricated in 0. 1 8-^m CMOS 
technology. 
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BRIEF SUMMARY OF THE INVENTION 
A 40-Gb/s clock and data recovery (CDR) circuit incorporates a quarter-rate 
phase detector and a multi-phase voltage controlled oscillator to re-time and de-multiplex 
5 a 40-Gb/s input data signal into four 10-Gb/s output data signals. The circuit is fabricated 
in 0.18 \xm CMOS technology. 



BRIEF DESCRIPTION OF THE DRAWINGS 
Referring now to the drawings in which like reference numbers represent 
1 0 corresponding parts throughout: 

FIG. 1 A is a block diagram that illustrates the architecture of a clock and data 
recovery (CDR) circuit of the preferred embodiment of the present invention; 

FIG. IB is a timing diagram that illustrates the operation of the clock and data 
recovery circuit in the preferred embodiment of the present invention; 
15 FIG. 2 A is a schematic that illustrates a half-quadrature voltage-controlled 

oscillator according to the preferred embodiment of the present invention; 
FIG. 2B is a schematic that illustrates a modification of FIG. 2 A; 
FIG. 2C is a schematic realization of a -G m cell according to the preferred 
embodiment of the present invention; 
20 FIG. 3A is a schematic that illustrates a quarter-rate phase detector and voltage- 

to-current converter according to the preferred embodiment of the present invention; 

FIG. 3B is a graph that illustrates the characteristic operation of the quarter-rate 
phase detector and voltage-to-current converter; 

FIG. 4A depicts the master-slave flip-flop used in the phase detector according to 
25 the preferred embodiment of the present invention; 

FIG. 4B depicts an XOR gate used in the phase detector according to the 
preferred embodiment of the present invention; 

FIG. 5 is a micrograph that shows a photo of a die for the clock and data recovery 
circuit that has been fabricated in a 0.18 jim CMOS technology; 
30 FIG. 6A is a graph showing the tuning range of the voltage-controlled oscillator 

according to the preferred embodiment of the present invention; 
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FIG. 6B is a graph showing the free-running spectrum of the voltage-controlled 
oscillator according to the preferred embodiment of the present invention; 

FIG. 7A is a graph that depicts the clock and data recovery circuit input and 
output waveforms under locked condition in response to a pseudo-random sequence of 
5 length 2 31 -1; and 

FIG. 7B is a graph that shows the recovered clock, suggesting an rms jitter of 
1.756 ps and a peak-to-peak jitter of 9.67 ps. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 
10 In the following description, reference is made to the accompanying drawings 

which form a part hereof, and which is shown, by way of illustration, several 
embodiments of the present invention. It is understood that other embodiments may be 
utilized and structural changes may be made without departing from the scope of the 
present invention. 
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A. Clock and Data Recovery ( CDR) Architecture 
FIG. 1 A is a block diagram that illustrates the architecture of a clock and data 
recovery (CDR) circuit 10 according to the preferred embodiment of the present 
invention. The CDR circuit 10 includes: (1) a multi-phase voltage-controlled oscillator 

20 (VCO) 12 for accepting a control signal and for changing a frequency of a clock signal 
output from the VCO 12 in response thereto, wherein the VCO 12 outputs a plurality of 
phases of the clock signal; (2) a quarter-rate phase detector (PD) 14 for sampling an input 
data signal using the phases of the clock signal received from the VCO 12 and generating 
a plurality of data output signals in response thereto, wherein each of the data output 

25 signals detects an edge or transition in the input data signal and whether the edge is early 
or late with respect to its corresponding clock signal phase; (3) a Voltage-to-Current (V/I) 
Converter 16 for converting the data output signals from the phase detector 14 to a 
control current; and (4) a loop filter (LPF) 18 for integrating the control voltage from the 
V/I Converter 16 and for outputting the control signal to the VCO 12 in response thereto. 

30 Specifically, the circuit 10 accepts a single 40-Gb/s input data signal D in , and re- 

times and de-multiplexes the input data signal D in into a plurality of 10-Gb/s output data 
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signals Dl ou t, D2 0Ut , D3 0U t and D4 0Ut . To accomplish this function, the PD 14 uses half- 
quadrature phases of the clock signal CK provided by the VCO 12 to sample the input 
data signal Dj n , thereby detecting the edges or transitions in the data input signal Di n and 
determining whether the clock signal CK is early or late. Specifically, four 10-GHz 
5 phase offsets CKo, CK45, CK 90 and CK135 of the clock signal are output from the VCO 
12, wherein adjacent ones of the phase offsets CKo, CK45, CK 90 and CK135 of the clock 
signal are half-quadrature phases, i.e., are offset in phase by 45° as indicated by their 
subscripts. 

FIG. IB is a timing diagram that illustrates the operation of the CDR circuit 10 in 
10 the preferred embodiment of the present invention. The PD 14 uses both the leading and 
trailing edges of the four 10-GHz phase offsets CKo, CK45, CK 90 and CKi 35 of the clock 
signal provided by the VCO 12 to sample the input data signal D in every 12.5 
picoseconds (ps), in order to detect edges or transitions in the input data signal Di n , 
thereby re-timing and de-multiplexing the 40-Gb/s input data signal D in into the four 10- 
15 Gb/s output data signals Dl out , D2 0ut , D3 0U t and D4 0Ut . The PD 14 also determines 

whether the clock signal CK is early or late. Using this quarter-rate (10-Gb/s) sampling, 
flip-flops (not shown) in the PD 14 have a hold time that can be four times as long as that 
required at full-rate (40-Gb/s) operation, but their acquisition speed must still guarantee 
correct sampling of the input data signal D in in less than 50 ps. 
20 Note that, in the absence of edges or transitions in the input data signal D in , the 

V/I Converter 16 generates no output current, leaving its control line to the LPF 18 and 
VCO 12 undisturbed. Note also that the circuit 10 is fully differential, except for the 
control line to the LPF 18 and VCO 12. 

25 B. Components of the Clock and Data Recovery Circuit 

1 . Voltage-Controlled Oscillator 
The speed, jitter, and driving capability required of the oscillator point to the use 
of an LC realization. A number of multi-phase LC oscillators have been reported. 
Coupled oscillators [4,5] operate away from the resonance frequency of the tanks so as to 
30 create the required phase shift, thus bearing a trade-off between reliability of oscillation 
and the phase noise [5]. Furthermore, such topologies are prone to oscillation at more 
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than one frequency because they can satisfy gain and phase requirements at multiple 
frequencies. The multi-phase oscillator in [6] drives transmission lines by a gain stage 
loaded by resistors, incurring energy loss in each cycle. 

FIG. 2 A is a schematic that illustrates a half-quadrature VCO 12 according to the 
5 preferred embodiment of the present invention, FIG. 2B is a schematic that illustrates a 
modification of FIG. 2 A, and FIG. 2C is a schematic realization of a -G m cell according 
to the preferred embodiment of the present invention. 

The multi-phase VCO 12 introduced here is based on the concept of differential 
stimulus of a closed-loop transmission line at equally-spaced points. As illustrated in 

10 FIG. 2 A, the loop circuit of the VCO 12 sustains a phase separation of 180° at diagonally- 
opposite nodes, providing 45° phase steps in between for the clock signal, wherein the 
nodes are labeled as 0°, 45°, 90°, 135°, 180°, 225°, 270° and 315°. Unlike the topologies 
in [5] and [7], this circuit does not operate away from the resonance frequency. 

Moreover, the VCO 12 oscillation frequency is uniquely given by the travel time 

15 of the wave around the loop. Also, in contrast to the design in [6], the transmission line 
requires no termination resistors, thereby displaying lower phase noise and larger voltage 
swings for a given power dissipation and inductor Q. 

The topology of FIG. 2A nonetheless necessitates long interconnects between the 
nodes and their corresponding -G m cells. However, recognizing that diagonally-opposite 

20 inductors LI, L2, L3, L4, L5, L6, L7 and L8 carry currents that are 180° out of phase, the 
circuit can be modified as shown in FIG. 2B, wherein inductor elements of the VCO 12 
are grouped into differential structures L1/L5, L2/L6, L3/L7 and L4/L8, and -G m cells are 
placed in close proximity to the nodes of the VCO 12. 

Exploiting the higher Q of differential inductors [8], the VCO 12 uses a structure 

25 for the -G m cell as shown in FIG. 2C, thereby shaping the rising and falling edges by the 
PMOS transistors Mi, M 2 , M 3 and M 4 , and hence lowering the up-conversion of l/f noise 
[9]. SpectreRF™ simulations indicate that, for a given power dissipation, inductor Q, and 
frequency of oscillation, the proposed VCO 12 achieves twice the voltage swings and 12 
dB lower phase noise than that in [6]. 

30 Using the structure of the -G m cell shown in FIG. 2C, each differential port of the 

VCO 12 is buffered by an inductively-loaded differential pair of switches M1-M3 and M 2 - 
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M 4 . These buffers performing the following: (1) isolate the VCO 12 from the long 
interconnects to the PD 14 that would otherwise introduce greater uncertainty in the 
oscillation frequency; (2) generate voltage swings above the supply voltage, thus driving 
the flip-flops of the PD 14 efficiently; and (3) isolate the VCO 12 from the edges or 
5 transitions coupled through the PD 14. 

2. Phase Detector 
FIG. 3 A is a schematic that illustrates a quarter-rate PD 14 and V/I Converter 16 
according to the preferred embodiment of the present invention, and FIG. 3B is a graph 
10 that illustrates the characteristic operation thereof. The PD 14 employs eight flip-flops 20 
that sample the input data signal D in at 12.5-ps intervals based on the clock signal from 
the VCO 12 using the phase offsets CKo, CK45, CK 90 , and CK135, wherein CK 180 - CK 0 , 
CK 225 = CK 45 , CK 270 =CK 90 and CK 315 = CK 135 . The PD 14 also employs eight XOR 

gates 22 that compare the outputs from adjacent or consecutive flip-flops 20. The V/I 
15 Converter 16 employs four Level Converters (LE) 24 that generate a current level from 
the combined output of the XOR gates 22 as the control line to the LPF 18 and VCO 12. 

In a manner similar to an Alexander topology [10], the PD 14 compares every two 
adjacent or consecutive samples stored by the adjacent or consecutive flip-flops 20 by 
means of the associated XOR gate 22, which generates a net output current if the two 
20 adjacent or consecutive samples are unequal, thereby indicating that an edge or transition 
has occurred in the input data signal Dj n . When no edges or transitions occur, the flip- 
flops 20 storing the two adjacent or consecutive samples produce equal outputs, the XOR 
gate 22 outputs a zero, and the control line from the V/I Converter 16 has a zero current. 
The early-late phase detection method used herein exhibits a bang-bang 
25 characteristic, forcing the CDR circuit 10 to align every other edge of the clock signal CK 
with the zero crossings of the input data signal D in under the locked condition. In reality, 
the meta-stable behavior of the flip-flops 20 leads to a finite PD 14 gain, allowing the 
clock signal CK edges to sustain some offset with respect to the zero crossings of the 
input data signal Di n . 

30 Shown in FIG. 3B is the input/output characteristic of the PD 14 together with the 

V/I Converter 16, obtained by transistor-level simulations while the circuit 10 senses a 
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40-Gb/s random stream of the input data signal D in and eight phases of the 10-GHz clock 
signal CK. For a phase error less than ±2.5 ps, the PD 14 displays a relatively constant 
gain of 100 |iA/ps. 

Even though the flip-flops 20 of the PD 14 operate with a 10-GHz clock signal 
5 CK, proper sampling of the 40-Gb/s input data signal Dj n still requires fast recovery from 
the previous state and rapid acquisition of the present input. To this end, both a wide 
sampling bandwidth and a short clock signal CK transition time are necessary. 

FIG. 4 A is a schematic of the master-slave flip-flop 20 used in the PD 14 
according to the preferred embodiment of the present invention. The master-slave flip- 
10 flop 20 includes switches MpMn and M b and capacitors Ci-C 2 . The flip-flop 20 latches 
input data signal D in , using a CK clock signal provided from the VCO 12 buffer, and data 
output signals D out . 

NMOS switches Mi and M 2 sample the input data signal D in on the parasitic 
capacitances at nodes X and Y when CK is high. Since the minimum input common- 

15 mode (CM) level is dictated by the gate-source voltage of M 3 -M 4 and the headroom 

required by I S s, the sampling switches Mi and M 2 experience only an overdrive voltage 
of 0.5 V even if CK reaches V D d, failing to provide fast sampling. This issue is remedied 
by setting the CM level of CK and CK equal to V DD , a choice afforded by the 
inductively-loaded stages of the VCO 12 buffer. The peak value of CK thus exceeds V DD 

20 by 0.8 V, more than doubling the sampling speed of Mj and M 2 . The large clock swings 
also minimize the transition times. 

With large clock swings available, the current switching in pairs M 5 -M 6 , M 7 -M 8 
and M9-M10 is accomplished by gate control rather than conventional source-coupled 
steering. The proposed topology offers two advantages: (1) since the tail current source is 

25 removed, Mi 1 -Mi 3 can be quite narrower, presenting a smaller capacitance to the VCO 12 
buffer; (2) since the drain currents of Mi 1 -Mi 3 are not limited by a tail current source, 
these transistors experience "class AB" switching, drawing a large current at the peak of 
the clock swing and providing greater voltage swings and a higher gain in the data path. 
FIG. 4B is a schematic of the XOR gate 22 used in the PD, along with the V/I 

30 Converter 16 (i.e., the Level Converter 24), according to the preferred embodiment of the 
present invention. The XOR gate 22 accepts signals a and b as input and includes 
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switches M1-M3, while the V/I Converter 16 accepts the output of the XOR gate 22 (as 
well as the next XOR gate 22) as input and includes switches M 4 -M 7 . The V/I Converter 
16 outputs the control line V out to the LPF 18 (and then to the VCO 12). 

The XOR gates 22 used in the PD 14 must exhibit symmetry with respect to their 
two inputs and operate with a low supply voltage. The XOR gate 22 shown in FIG. 4B is 
a modified version of that in [1 1], with transistors M 2 and M 3 forming local positive 
feedback loops and avoiding the reference voltage necessary in the realization in [11]. 
The V/I Converter 16 copies the output current of the XOR gate 22, providing nearly rail- 
to-rail swings for the control line V out to the LPF 18 (and VCO 12). 



C. Experimental Results 

The CDR circuit 1 0 of the present invention has been fabricated in a 0. 1 8 |im 
CMOS technology. FIG. 5 is a micrograph that shows a photo of the die, which measures 
1.0 x 1.4 mm 2 . The circuit is tested on a high-speed probe station with a 40-Gb/s 
1 5 Anritsu™ random data generator providing the input. 

Shown in FIG. 6 A is the VCO 12 tuning range and shown in FIG. 6B is the free- 
running spectrum. The VCO 12 provides a tuning range of 1.2-GHz with a free-running 
phase noise of -105 dBc/Hz at 1-MHz offset. 

FIG. 7 A depicts the CDR circuit 10 input and output waveforms under locked 
20 condition in response to a pseudo-random sequence of length 2 31 -1 . The de-multiplexed 
data experiences some inter-symbol interference (ISI), but if further de-multiplexing is 
included on the same chip, the ISI can be tolerated. FIG. 7B shows the recovered clock, 
suggesting an rms jitter of 1 .756 ps and a peak-to-peak jitter of 9.67 ps. 

However, as shown in the inset, the oscilloscope itself suffers from rms and peak- 
25 to-peak jitters of 1 .508 ps and 8.89 ps, respectively. Thus, the CDR circuit 10 output 

contains a jitter of 0.9 ps,rms and at most 9.67 ps,pp. (It is unclear whether and how the 
peak-to-peak values can be subtracted.) 

The performance of this work and some other previously-published CDR circuits 
is summarized in Table 1 . (The power dissipation noted here for the design in [2] 
30 excludes their limiting amplifier and frequency detector contribution and was obtained 
through private communication with M. Reinhold.) 
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Conclusion 

This concludes the description of the preferred embodiment of the invention. The 
following describes some alternative embodiments for accomplishing the present 
invention. 

For example, the present invention could be used with many types of circuits, and 
not just those described herein. Moreover, any number of different components or 
different configurations of components could be used without departing from the scope of 
the present invention. Finally, any number of input data signals, phase offset clock 
signals and de-multiplexed output data signals could be generated by the present 
invention. 

The foregoing description of one or more embodiments of the invention has been 
presented for the purposes of illustration and description. It is not intended to be 
exhaustive or to limit the invention to the precise form disclosed. Many modifications 
and variations are possible in light of the above teaching. It is intended that the scope of 
the invention be limited not by this detailed description, but rather by the claims 
appended hereto. 
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